<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
    <link rel="icon" href="../images/logo/logo.png" type="image/x-icon">
    <link rel="shortcut icon" href="../images/logo/logo.png"
          type="image/x-icon">
    <title>浏阳德塔软件开发有限公司 女娲计划</title>
</head>
<body style="Max-width: 700px; text-align:center; margin:auto;">
<div style="text-align:left; Max-width: 680px; margin-left:15px;">
    <a href="../">上一页</a>
    <br/>
    <br/>
    <br/>第一章_德塔自然语言图灵系统
    <br/> 作者: 罗瑶光, Author:Yaoguang.Luo<br/>
    <br/> 基础应用: 元基催化与肽计算 编译机的语言分析机
    <br/>
    NLP <br/>
    Deta Parser的自然语言处理, 函数功能主要体现在基于词汇索引森林的长度裁剪上, 中文的词汇格式
    比较统一, 不像西方语的 元音搭配方式, 如一个词汇中的元音含量的flech 弗莱士词汇难度定义,
    中文一般表达为 单字的文言词, 双字普通词汇, 三字的俗语, 4字的成语, 5字以上一般为谚语和特定
    短语词汇, 而中文的5字以上的短语词汇某种意义上又可以进行1234字拆分, 举例 ‘巧媳妇难逃无米之炊’
    这9个字如果作为谚语词汇出现, 其实也可以分词为 ‘巧+ 媳妇+ 难+ 逃+ 无米之炊’ 于是罗瑶光先生将长度
    最大值设为4. 在保障分词的精准度上, 进行流水阀门的统计排列, 发现2字词和单字词的随机文章中频率
    比较高, 于是将2, 1字词的处理函数靠前, 逐渐 deta的 NLP流水阀门切词函数成型. 因为这种方式,
    Deta POS的流水阀门也继承了这种高频优先计算思维. 描述人 罗瑶光 <br/>
    <br/>
    Deta NLP <br/>
    Deta parser of the Nature Language Process, was based on Its map-forest
    of indexed length of lexicons. Because the formative word was combined
    from connected Chinese alphabetics. Meant totally different with
    european lexicons, a 'Flechs or Flesh' parser the ratio about number of
    word-vowels per the word-length. Seem the length of Chinese word
    commonly could parser as four types of 'one char of achaism or
    singleton', 'two chars of simple word', 'three chars of special word or
    slang', 'four chars of idiom and slang' and more. The 'more' means an
    example of '巧媳妇难逃无米之炊' here, although It was the nine-chars of slang,
    but It could be separated out a tokens-list of '巧'+ '媳妇'+ '难'+ '逃'+
    '无米之炊'. So, the Deta parser could easily make a recognition of this
    tokens-list by using 'Dynamic River Flows Gate Function Marching and
    Circustantly Loop the POS Kernel Computing'. Above tokens-list
    contained more 'one or two char words' of '巧'+ '媳妇'+ '难'+ '逃', so the
    priorty to process a class of 'one char-word' is more higher than the
    class of idiom and slang. The author considered It was an evolutional
    theory about priorty to high frequency. <br/>
    Author Yaoguangluo 稍后优化语法. <br/>
    <br/>
    <img class="banner_img" style="width: 100%" src="../images/5_7108/1/1_9.jpg"
         alt="浏阳德塔软件开发有限公司,罗瑶光"/>
    1 德塔分词的核心类, 包含了词性的词长切分所有函数. refer page 119, 120 <br/>

</div>
</body>